Picture for Kunchang Li

Kunchang Li

LightBagel: A Light-weighted, Double Fusion Framework for Unified Multimodal Understanding and Generation

Add code
Oct 27, 2025
Figure 1 for LightBagel: A Light-weighted, Double Fusion Framework for Unified Multimodal Understanding and Generation
Figure 2 for LightBagel: A Light-weighted, Double Fusion Framework for Unified Multimodal Understanding and Generation
Figure 3 for LightBagel: A Light-weighted, Double Fusion Framework for Unified Multimodal Understanding and Generation
Figure 4 for LightBagel: A Light-weighted, Double Fusion Framework for Unified Multimodal Understanding and Generation
Viaarxiv icon

Super Encoding Network: Recursive Association of Multi-Modal Encoders for Video Understanding

Add code
Jun 09, 2025
Figure 1 for Super Encoding Network: Recursive Association of Multi-Modal Encoders for Video Understanding
Figure 2 for Super Encoding Network: Recursive Association of Multi-Modal Encoders for Video Understanding
Figure 3 for Super Encoding Network: Recursive Association of Multi-Modal Encoders for Video Understanding
Figure 4 for Super Encoding Network: Recursive Association of Multi-Modal Encoders for Video Understanding
Viaarxiv icon

Emerging Properties in Unified Multimodal Pretraining

Add code
May 20, 2025
Figure 1 for Emerging Properties in Unified Multimodal Pretraining
Figure 2 for Emerging Properties in Unified Multimodal Pretraining
Figure 3 for Emerging Properties in Unified Multimodal Pretraining
Figure 4 for Emerging Properties in Unified Multimodal Pretraining
Viaarxiv icon

Seed1.5-VL Technical Report

Add code
May 11, 2025
Figure 1 for Seed1.5-VL Technical Report
Figure 2 for Seed1.5-VL Technical Report
Figure 3 for Seed1.5-VL Technical Report
Figure 4 for Seed1.5-VL Technical Report
Viaarxiv icon

Make Your Training Flexible: Towards Deployment-Efficient Video Models

Add code
Mar 18, 2025
Viaarxiv icon

V-Stylist: Video Stylization via Collaboration and Reflection of MLLM Agents

Add code
Mar 15, 2025
Figure 1 for V-Stylist: Video Stylization via Collaboration and Reflection of MLLM Agents
Figure 2 for V-Stylist: Video Stylization via Collaboration and Reflection of MLLM Agents
Figure 3 for V-Stylist: Video Stylization via Collaboration and Reflection of MLLM Agents
Figure 4 for V-Stylist: Video Stylization via Collaboration and Reflection of MLLM Agents
Viaarxiv icon

TimeStep Master: Asymmetrical Mixture of Timestep LoRA Experts for Versatile and Efficient Diffusion Models in Vision

Add code
Mar 10, 2025
Figure 1 for TimeStep Master: Asymmetrical Mixture of Timestep LoRA Experts for Versatile and Efficient Diffusion Models in Vision
Figure 2 for TimeStep Master: Asymmetrical Mixture of Timestep LoRA Experts for Versatile and Efficient Diffusion Models in Vision
Figure 3 for TimeStep Master: Asymmetrical Mixture of Timestep LoRA Experts for Versatile and Efficient Diffusion Models in Vision
Figure 4 for TimeStep Master: Asymmetrical Mixture of Timestep LoRA Experts for Versatile and Efficient Diffusion Models in Vision
Viaarxiv icon

VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling

Add code
Dec 31, 2024
Figure 1 for VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
Figure 2 for VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
Figure 3 for VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
Figure 4 for VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
Viaarxiv icon

Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment

Add code
Dec 26, 2024
Viaarxiv icon

Causal Diffusion Transformers for Generative Modeling

Add code
Dec 17, 2024
Viaarxiv icon